Skip to content

Conversation

@frobion
Copy link

@frobion frobion commented Oct 1, 2025

There is a thread looping in the method PhysicalConnection.ReadFromPipe to process response from Redis, match them with the sent command and signaling the completion of the message. If this thread has an exception, its catch block will call RecordConnectionFailed which will proceed to restart a new thread to continue reading Redis responses.

However, if another exception occurred in the catch before the new thread can be started (in a case of high memory pressure, OOM exceptions can happen anywhere) we are in a state where no one is reading the pipe of Redis responses, and all commands sent end in timeout.

If at least one async command is sent, the heartbeat thread will detect the timeout in the OnBridgeHeartbeat method, and if no read were perform for 4 heartbeat it will issue a connection failure. However, no such protection were in place if only sync commands are sent. In this case, they were all ending in timeout without any mechanisms to start reading their responses again.

In this PR, the heartbeat thread will check timeouts for sync commands as well. Therefore, it will be able to start the thread looping in ReadFromPipe even if only sync commands are sent, ensuring we will not reach a state were all commands end in timeout.

This PR is loosely linked to the issue #2919, as this problem and its correction were found during investigation of the issue.
Moreover, it can be related to issue #2888 as the symptoms are similar to what I observed. However, I don't know if in this case only synchronous commands were sent or not.

There is a thread looping in the method
PhysicalConnection.ReadFromPipe to process response from Redis, match
them with the sent command and signaling the completion of the
message. If this thread has an exception, its catch block will call
RecordConnectionFailed which will proceed to restart a new thread to
continue reading Redis responses.

However, if another exception occurred in the catch before the new
thread can be started (in a case of high memory pressure, OOM
exceptions can happen anywhere) we are in a state where no one is
reading the pipe of Redis responses, and all commands sent end in
timeout.

If at least one async command is sent, the heartbeat thread will
detect the timeout in the OnBridgeHeartbeat method, and if no read
were perform for 4 heartbeat it will issue a connection failure.
With this commit, this becomes true for sync commands as well.
Therefore, it ensures we will not reach a state were all commands end
in timeout.
// This is an "always" check - we always want to evaluate a dead connection from a non-responsive sever regardless of the need to heartbeat above
if (timedOutThisHeartbeat > 0
var totalTimeoutThisHeartbeat = asyncTimeoutThisHeartbeat + syncTimeoutThisHeartbeat;
if ((totalTimeoutThisHeartbeat > 0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: this is comparing sync timeouts but using async thresholds in the line below, so we might be introducing a case in which we've violating async timeout * 4, but only had 1 sync timeout that's not severe, creating more disconnects.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, fixed in 0ec7c4a

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants